Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libroach,engine: support pagination of ExportToSst #44440

Merged
merged 1 commit into from
Jan 28, 2020

Conversation

ajwerner
Copy link
Contributor

This commit extends the engine interface to take a targetSize parameter in
the ExportToSst method. The iteration will stope if the first version of a key
to be added to the SST would lead to targetSize being exceeded. If
exportAllRevisions is false, the targetSize will not be exceeded unless the
first kv pair exceeds it.

This commit additionally fixes a bug in the rocksdb implementation of
DBExportToSst whereby the first key in the export request would be skipped.
This case likely never occurred because the key passed to Export was rarely
exactly the first key to be included (see the change related to seek_key in
db.cc).

The exportccl.TestRandomKeyAndTimestampExport was extended to excercise various
targetSize limits. That test run under stress with the tee engine inspires some
confidence and did catch the above mentioned bug. More testing would likely be
good.

This commit leaves the task of adopting the targetSize parameter for later.

Fixes #39717.

Release note: None

@ajwerner ajwerner added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Jan 28, 2020
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@ajwerner
Copy link
Contributor Author

@itsbilal lmk if there's somebody better to review this. One improvement I'm considering is going through and naming the return param on the engine.Reader.ExportToSst method to make it more explicitly a resumeKey.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick first pass and left a few comments on the C++. Not a full review.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)


c-deps/libroach/db.cc, line 1100 at r1 (raw file):

  // versions of a key to the writer.
  bool paginated = target_size > 0;
  std::string cur_key ("");

The explicit constructor call is unnecessary: s/ ("")//g.


c-deps/libroach/db.cc, line 1104 at r1 (raw file):

  // Seek to the MVCC metadata key for the provided start key and let the
  // incremental iterator find the appropriate version.
  DBKey seek_key = (DBKey){ .key = start.key };

Huh, this is some weird new C++ syntax. Are field initializers like that part of C++ now? I know for a while that was a gcc-extension. Doing some googling, it isn't clear to me if the (DBKey) cast is necessary. I think that can be removed. Maybe.

@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request branch from 2c7373f to 0186f41 Compare January 28, 2020 15:00
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)


c-deps/libroach/db.cc, line 1100 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The explicit constructor call is unnecessary: s/ ("")//g.

Done.


c-deps/libroach/db.cc, line 1104 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Huh, this is some weird new C++ syntax. Are field initializers like that part of C++ now? I know for a while that was a gcc-extension. Doing some googling, it isn't clear to me if the (DBKey) cast is necessary. I think that can be removed. Maybe.

It’s kosher as of C99 so I assumed it would just work in C++. You’re right about the cast. Removed.

If there's a more idiomatic way of constructing the DBKey with a zero timestamp, let me know. In go we call MakeMVCCMetadataKey.

@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request branch from 0186f41 to b95fb60 Compare January 28, 2020 16:03
Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @itsbilal)


c-deps/libroach/db.cc, line 1104 at r1 (raw file):

Previously, ajwerner wrote…

It’s kosher as of C99 so I assumed it would just work in C++. You’re right about the cast. Removed.

If there's a more idiomatic way of constructing the DBKey with a zero timestamp, let me know. In go we call MakeMVCCMetadataKey.

This idiom looks good to me.


c-deps/libroach/db.cc, line 1099 at r2 (raw file):

  // SST then we need to keep track of when we've finished adding all of the
  // versions of a key to the writer.
  bool paginated = target_size > 0;

Might want to sprinkle some const declarations around. Looks like this variable can be const, and some of the ones in the loop below.

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: save for some minor comments.

Reviewed 8 of 10 files at r1, 2 of 2 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner)


c-deps/libroach/db.cc, line 1141 at r2 (raw file):

    int64_t new_size = cur_size + decoded_key.size() + iter.value().size();
    bool is_over_target = paginated && cur_size > 0 && new_size > target_size;
    if (is_new_key && is_over_target) {

Nit: Might be a good idea adding paginated && to this conditional and removing it from is_over_target. Makes it clearer that this only runs in the paginated cases, just like the conditional above. The boolean logic here (and the interactions between export_all_revisions, paginated, etc) are confusing enough.


c-deps/libroach/db.cc, line 1170 at r2 (raw file):

  // If we're not returning an error, check to see if we need to return the resume key.
  if (res.data == NULL && resume_key.length() > 0) {

Maybe s/res.data == NULL/res.ok()/ ? That's how we usually check for non-error DBStatuses.


c-deps/libroach/db.cc, line 1171 at r2 (raw file):

  // If we're not returning an error, check to see if we need to return the resume key.
  if (res.data == NULL && resume_key.length() > 0) {
    *resume = ToDBString(rocksdb::Slice(resume_key));

You should also be able to skip the resume_key conversion from string to rocksdb::Slice. ToDBString should still work. Rocksdb passes the two types around interchangeably a lot.


pkg/storage/engine/pebble.go, line 1153 at r2 (raw file):

		}
		unsafeValue := iter.UnsafeValue()
		isNewKey := !exportAllRevisions || unsafeKey.Key.Compare(curKey) != 0

bytes.Equal(unsafeKey.Key, curKey) is a bit more performant and easier to read


pkg/storage/engine/pebble.go, line 1169 at r2 (raw file):

			isOverTarget := paginated && curSize > 0 && uint64(newSize) > targetSize
			if isNewKey && isOverTarget {
				resumeKey = append(roachpb.Key{}, unsafeKey.Key...) // allocate the right size

Unsure of what the comment is referring to - do you want to allocate exactly len(unsafeKey.Key) bytes? In that case why not make([]byte, len(unsafeKey.Key)) and then copy(...) ?

This commit extends the engine interface to take a targetSize parameter in
the ExportToSst method. The iteration will stope if the first version of a key
to be added to the SST would lead to targetSize being exceeded. If
exportAllRevisions is false, the targetSize will not be exceeded unless the
first kv pair exceeds it.

This commit additionally fixes a bug in the rocksdb implementation of
DBExportToSst whereby the first key in the export request would be skipped.
This case likely never occurred because the key passed to Export was rarely
exactly the first key to be included (see the change related to seek_key in
db.cc).

The exportccl.TestRandomKeyAndTimestampExport was extended to excercise various
targetSize limits. That test run under stress with the tee engine inspires some
confidence and did catch the above mentioned bug. More testing would likely be
good.

This commit leaves the task of adopting the targetSize parameter for later.

Fixes cockroachdb#39717.

Release note: None
@ajwerner ajwerner force-pushed the ajwerner/paginate-export-request branch from b95fb60 to 05f91e5 Compare January 28, 2020 19:28
Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the prompt reviews!

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner, @itsbilal, and @petermattis)


c-deps/libroach/db.cc, line 1099 at r2 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Might want to sprinkle some const declarations around. Looks like this variable can be const, and some of the ones in the loop below.

Done.


c-deps/libroach/db.cc, line 1121 at r2 (raw file):

    }

    bool is_new_key = !export_all_revisions || decoded_key.compare(rocksdb::Slice(cur_key)) != 0;

turns out I didn't need this rocksdb::Slice either.


c-deps/libroach/db.cc, line 1141 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Nit: Might be a good idea adding paginated && to this conditional and removing it from is_over_target. Makes it clearer that this only runs in the paginated cases, just like the conditional above. The boolean logic here (and the interactions between export_all_revisions, paginated, etc) are confusing enough.

Done.


c-deps/libroach/db.cc, line 1170 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Maybe s/res.data == NULL/res.ok()/ ? That's how we usually check for non-error DBStatuses.

erm that method seems to be on a rocksdb::Status this is just a DBStatus. The comment there says If DBStatus.data == NULL the operation succeeded. I could use status here but it's a little bit less. We also already do the same check above on status.


c-deps/libroach/db.cc, line 1171 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

You should also be able to skip the resume_key conversion from string to rocksdb::Slice. ToDBString should still work. Rocksdb passes the two types around interchangeably a lot.

Done.


pkg/storage/engine/pebble.go, line 1153 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

bytes.Equal(unsafeKey.Key, curKey) is a bit more performant and easier to read

ha forgot about Equals after the C++.


pkg/storage/engine/pebble.go, line 1169 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Unsure of what the comment is referring to - do you want to allocate exactly len(unsafeKey.Key) bytes? In that case why not make([]byte, len(unsafeKey.Key)) and then copy(...) ?

See how this makes you feel.

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm_strong:

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @itsbilal, and @petermattis)


c-deps/libroach/db.cc, line 1170 at r2 (raw file):

Previously, ajwerner wrote…

erm that method seems to be on a rocksdb::Status this is just a DBStatus. The comment there says If DBStatus.data == NULL the operation succeeded. I could use status here but it's a little bit less. We also already do the same check above on status.

My bad, forgot this was DBStatus. Sounds good then.


pkg/storage/engine/pebble.go, line 1169 at r2 (raw file):

Previously, ajwerner wrote…

See how this makes you feel.

Looks better - letting append grow the slice from 0 usually means it'll end up being some power of 2. This is better if you want to allocate exactly that length.

Copy link
Contributor Author

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bors r+

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @itsbilal, and @petermattis)

craig bot pushed a commit that referenced this pull request Jan 28, 2020
44440: libroach,engine: support pagination of ExportToSst r=ajwerner a=ajwerner

This commit extends the engine interface to take a targetSize parameter in
the ExportToSst method. The iteration will stope if the first version of a key
to be added to the SST would lead to targetSize being exceeded. If
exportAllRevisions is false, the targetSize will not be exceeded unless the
first kv pair exceeds it.

This commit additionally fixes a bug in the rocksdb implementation of
DBExportToSst whereby the first key in the export request would be skipped.
This case likely never occurred because the key passed to Export was rarely
exactly the first key to be included (see the change related to seek_key in
db.cc).

The exportccl.TestRandomKeyAndTimestampExport was extended to excercise various
targetSize limits. That test run under stress with the tee engine inspires some
confidence and did catch the above mentioned bug. More testing would likely be
good.

This commit leaves the task of adopting the targetSize parameter for later.

Fixes #39717.

Release note: None

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jan 28, 2020

Build succeeded

@craig craig bot merged commit 05f91e5 into cockroachdb:master Jan 28, 2020
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Jan 30, 2020
In cockroachdb#44440 we added a `targetSize` parameter to enable pagination of export
requests. In that PR we defined the targetSize to return just before the
key that would lead to the `targetSize` being exceeded. This definition is
unfortunate when thinking about a total size limit for pagination in the
DistSender (which we'll add when cockroachdb#44341 comes in). Imagine a case where
we set a total byte limit of 1MB and a file byte limit of 1MB. That setting
should lead to at most a single file being emitted (assuming one range holds
enough data). If we used the previous definition we'd create a file which is
just below 1MB and then the DistSender would need send another request which
would contain a tiny amount of data. This brings the behavior in line with
the semantics introduced in cockroachdb#44341 for ScanRequests and is just easier to
reason about.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 30, 2020
44489: sql: bugfixes around writing the old primary key in pk changes r=rohany a=rohany

This PR fixes two bugs:
* The logic for when to rewrite the old primary key was broken
  resulting in the old primary key not being rewritten in many cases.
* The old primary key being created was not properly dealing with
  its dangling interleave information.

This PR makes the design decision to not re-interleave the copy of the old primary index if it was interleaved.

Release note: None

44551: jobs: always set start time of a job r=spaskob a=spaskob

When starting a job via CreateAndStartJob if
making the job started fails the job will stil be
in system.jobs and can be adopted by another
node later but the started time will be 0 in this
case. We add a check and set it if necessary.

Release note: none.

44553: engine: redefine targetSize and add maxSize to ExportToSst r=itsbilal a=ajwerner

This PR is a follow-up of work from #44440 motivated by problems unearthed while typing #44482. The PR comes in two commits:

1) Re-define `targetSize` from being a target below which most requests would remain to being the size above which the export stops.
2) Add a `maxSize` parameter above which the `ExportToSst` call will fail.

See the individual commits for more details. 

Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Co-authored-by: Spas Bojanov <spas@cockroachlabs.com>
Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Feb 3, 2020
This commit adopts the API change in cockroachdb#44440 and the previous commit. It adds
a hidden cluster setting to control the target size. There's not a ton of
testing but there's some.

Further work includes:

 * Adding a mechanism to limit the number of files returned from an
   ExportRequest for use in CDC backfills. This is currently blocked
   on cockroachdb#44341.

I'm omitting a release note because the setting is hidden.

Release note: None.
ajwerner added a commit to ajwerner/cockroach that referenced this pull request Feb 7, 2020
This commit adopts the API change in cockroachdb#44440 and the previous commit. It adds
a hidden cluster setting to control the target size. There's not a ton of
testing but there's some.

I'm omitting a release note because the setting is hidden.

Release note: None.
craig bot pushed a commit that referenced this pull request Feb 7, 2020
44482: storageccl: add setting to control export file size and paginate export r=pbardea a=ajwerner

This commit adopts the API change in #44440. It adds a hidden cluster setting
to control the target size. The testing is minimal.

This PR comes in two commits:

1) Add a parameter to the ExportSST request to control the target size
2) Add two cluster settings
   * Control the target size
   * Control how much over the target size before an error is generated

Closes #43356

Release note: None.

Co-authored-by: Andrew Werner <ajwerner@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: investigate larger default range sizes
4 participants